Linguistically Motivated Descriptive Term Selection
نویسندگان
چکیده
A linguistically motivated approach to indexing, that is the provision of descriptive terms for texts of any kind, is presented and illustrated. The approach is designed to achieve good, i.e. accurate and flexible, indexing by identifying index term sources in the meaning representations built by a powerful general purpose analyser, and providing a range of text expressions constituting semantic and syntactic variants for each term concept. Indexing is seen as a legitimate form of shallow text processing, but one requiring serious semantically based language processing, particularly to obtain well-founded complex terms, which is the main objective of the project described. The type of indexing strategy described is further seen as having utility in a range of applications environments.
منابع مشابه
Rigorous dimensionality reduction through linguistically motivated feature selection for text categorization
This paper introduces a new linguistically motivated feature selection technique for text categorization based on morphological analysis. It will be shown that compound parts that are constituents of many (different) noun compounds throughout a text are good and general indicators of this text’s content; they are more general in meaning than the compounds they are part of, but nevertheless have...
متن کاملAn Evaluation of Linguistically-motivated Indexing Schemes
In this article, we describe a number of indexing experiments based on indexing terms other than simple keywords. These experiments were conducted as one step in validating a linguistically-motivated indexing model. The problem is important but not new. What is new in this approach is the variety of schemes evaluated. It is important since it should not only help to overcome the well-known prob...
متن کاملHybrid Selection of Language Model Training Data Using Linguistic Information and Perplexity
We explore the selection of training data for language models using perplexity. We introduce three novel models that make use of linguistic information and evaluate them on three different corpora and two languages. In four out of the six scenarios a linguistically motivated method outperforms the purely statistical state-of-theart approach. Finally, a method which combines surface forms and th...
متن کامل$YWRPDWVNR OXãþHQMH L]UD]MD L] VORYHQVNR-angleških vzporednih besedil
The paper describes the design and structure of a Slovene-English term extraction system. Although the state-of-the-art systems operate on hybrid approaches using various levels of linguistic analysis, sometimes including semantic information, the aim here was to implement both statistical and linguistically motivated methods for both languages and compare the results. It is shown that some met...
متن کاملLinguistically Annotated BTG for Statistical Machine Translation
Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1984